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Molecular Characterization of the D Surface Protein Gene 
Subfamily in Paramecium primaurelia 

FLORENCE M. BOURG AIN-GUGLIELMETTI 1 and FRANCOIS M. CARON* 2 
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75230 Paris, Cedex 05, France 

ABSTRACT. When Paramecium primaurelia expresses the D serotype, a major high molecular weight mRNA species is detected in 
the cytoplasm. Using the cDNA derived from this mRNA as a probe, three very similar genes, Da, D0 and D7, were cloned. Of these 
three genes, we show that only the Da mRNA is present in the cytoplasm of cells expressing the D serotype and corresponds to the 
major mRNA species. The nucleotide sequence of the entire coding region of the Da gene, as well as the upstream and downstream 
sequences, has been determined. The 7632-nucleotide open reading frame encodes a putative protein that displays the characteristic 
cysteine residue periodicity of Paramecium surface antigens but does not contain central tandemly repeated sequences. Partial sequences 
of the two nonexpressed genes D£ and Dy indicate a high percentage of identity (90%-95%) with the Da gene, suggesting that D0 and 
Dy genes are either very similar surface protein genes whose transcription is repressed trough mutual exclusion, or perhaps are 
pseudogenes. A region of variable DNA rearrangement was identified 1 kb upstream of the Dy gene. This macronuclear region arises 
from the same micronuclear locus by alternative excision of internal eliminated sequences during macronuclear development. 
Supplementary key words. Alternative DNA rearrangements. 



T^ARAMECIUM primaurelia possesses a family of surface 
-I antigen genes. In most cases, only one of these genes is 
expressed (exclusion rule) and the corresponding protein covers 
the external surface of the cell constituting the cell coat. Two 
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surface antigens called G and D have been extensively studied 
by biochemical, genetical and immunological means (for a re- 
view see [29]). The G protein is stably expressed in the 1 5° C- 
28° C temperature range whereas the D protein is expressed 
above 30° C. At the molecular level, the G protein gene has 
been cloned and entirely sequenced from two geographically 
distinct isolates: strains 156 and 168 [26, 27]. They are huge 
allelic proteins with molecular masses in the 275 kDa range and 
display along their whole sequence a similar pseudoperiodic 
structure. This pseudoperiodic structure is shared by all se- 



3b4 



J. EUK. MICROBIOL.. VOL. 43, NO. 4, JULY-AUGUST 1996 



quenced Paramecium surface antigen genes. At the center of the 
aminoacid sequences of these two surface antigens, 1 56G and 
168G, four almost perfect repeats of about 70 residues are pres- 
ent and appear as a distinctive feature of these proteins: indeed, 
the similarity between the two allelic sequences is extremely 
high (98%) except for these central repeats where the similarity 
percentage drops to 60% [8]. Different immunological argu- 
ments indicate that these four central repeats form four identical 
domains, which are the only parts of the molecule accessible 
from the external medium [3, 5, 26], 

Paramecium primaurelia shares the common characteristic 
ofciliates: nuclear dimorphism. In each cell, two types of nuclei 
coexist: a micronucleus, which is essentially transcriptionally 
inactive and acts as a germinal nucleus; and a macronucleus, 
which is transcriptionally active and acts as a somatic nucleus. 
After each sexual process, a new macronucleus is made from a 
copy of the zygotic nucleus and the old macronucleus is de- 
stroyed. A complex DNA rearrangement process takes place 
during the biogenesis of this macronucleus consisting in three 
types of modifications: internal sequence elimination, chro- 
mosome fragmentation and DNA amplification. Different types 
of eliminated micronuclear sequences have been reported in the 
literature. Among them, one type called IES (internal eliminated 
sequence) are unique DNA sequences rich in A and T bases and 
bordered by two direct repeats of the dinucleotide 5-TA-3'. 
They are eliminated by an excision-religation mechanism that 
maintains only one of the two direct repeats. Although recent 
efforts have been made to determine the nature of this process 
(cis specific sequences, trans effecting protein factors), it remains 
essentially unknown (for a review, see [33]). 

In this paper, we have tried to extend the type of structural 
research already carried out on the G surface protein of strain 
1 56 to the D surface protein of the same strain which, as men- 
tioned above, is expressed at higher temperature. Using the 
mRNA sequence of the D protein as a probe, we have screened 
a genomic macronuclear library and cloned three genes whose 
sequences cross-hybridize strongly with the mRNA sequence. 
The three genes have been mapped and found to have extremely 
similar sequences. Only one of the genes is expressed in the D 
serotype. The complete nucleotide sequence of the expressed 
gene has been determined and from the deduced amino acid 
sequence we show that the structure of the D protein resembles 
that of the G protein except for the absence of the central repeats; 
a surface protein of Paramecium tetraurelia. 5 1C, has also been 
reported to lack the central repeats [25]. During the course of 
this work, an unusual variable region was found close to the 5' 
end of the coding sequence of one of the two nonexpressed genes. 
Close inspection of this upstream region shows that it varies 
from one macronuclear copy to another. The sequences of some 
of these various macronuclear versions have been determined 
and we suggest that they could be generated by alternative elim- 
ination of IES. 

MATERIALS AND METHODS 

Cell line and cultivation. Cells from Paramecium primau- 
relia wild- type strain 1 56 were grown in "scotch grass" infusion 
as described by Sonneborn [38], bacterized the day before use 
with Klebsiella pneumoniae and supplemented with 0.8 Mg/rnl 
of ^-sitosterol (Merck). Cultivation were carried out at 24° C or 
33° C for expression of the G or D serotypes respectively. Surface 
antigen expression was determined and routinely checked dur- 
ing culture expansion by the immobilization test [2] using an- 
tisera raised against either 1 56G or 1 56D surface antigens. Cul- 
tures were used when 100% of the cells expressed'a given surface 
antigen. Cells were collected by centrifugation and washed in 



DryKs solution [ 1 2]. The compact pellet of cells was either used 
immediately for DNA extraction or frozen for RNA extraction. 
In the latter case, pelleted cells were poured dropwise into liquid 
nitrogen and kept at -80° C. 

DNA analysis. Genomic Paramecium DNA was isolated as 
described previously in Prat [26]. Electrophoresis and Southern 
blot hybridization were performed according to usual methods 
[35]. 

RNA extraction and analysis. Two methods were used to 
prepare total RNA from frozen pelleted Paramecium cells: the 
first method was adapted from Chirgwin et al. [9] and described 
in Meyer et al. [24]. RNA used in the experiment of Fig. 1 was 
prepared by this method. The second method was method 2 
described in Meyer et al. [24] with minor modifications. This 
method was used for the preparation of total RNA in the ex- 
periments of Fig. 5. 

RNA were run on methylmercury gels and blotted on Hybond 
N + membranes according to Sambrook [35]. Northern blots 
were hybridized with oligonucleotides Osrl, Osr2 and Osr3 in 
the buffer described by Emilsson and Kurland [13]. 

Stringency determination for oligonucleotide hybridiza- 
tion. To find conditions suitable for the specific hybridization 
of the three oligonucleotides Osrl, Osr2, Osr3 (see Fig. 4 for 
their sequences) with the corresponding identical sequences, we 
immobilized on Nylon 4 " membranes using a dot blot apparatus, 
2 ng to 2 Mg of recombinant plasmid DNA containing as insert 
either SI or S2 or S3 (Fig. 2 for the localization of these frag- 
ments). These sequences contained respectively the Osrl, Osr2 
and Osr3 sequences. DNA were denatured in 0.4 M NaOH for 
30 min. at 65° C prior to filter binding. The filters were succes- 
sively probed with each oligonucleotide and washed in 0.2 x 
SSC (150 niM NaCl, 15 mM Na Citrate pH 7), 0.5% SDS at 
increasing temperatures for 30 min. At low stringency (45° C) 
the three oligonucleotides hybridized to each of the plasmids. 
Above 60° C, the 23-bp oligonucleotide Osrl hybridized with 
SI but not with S2 or S3. Above 55° C, the two 23-bp oligo- 
nucleotides Osr2 and Osr3 hybridized with the fragment con- 
taining the exact sequence and not with the two others. Thus, 
by using these stringency conditions, oligonucleotides Osr 1 , Osr2 
and Osr3 constitute specific probes for genes Da, D& and D7, 
respectively. 

Cloning of probe pEDx. The method used was described in 
Meyer et al. [24]. A short account is given here: polyA^ mRNA 
were purified from G or D expressing cells on an oligo-dT cel- 
lulose column (Fig. 1A), denatured with methyl mercury and 
sized-fractionated on a sucrose gradient. cDNA was made from 
selectively enriched high molecular weight mRNA and used as 
probe in a differential screening of an EcoRI library. Plaques 
positive with the cDNA D probe and negative with the cDNA 
G probe were selected, subcloned and their phage DNA purified 
and analyzed by restriction mapping. An EcoRI fragment of 1.6 
kb was subcloned in pUC 1 8. This recombinant plasmid (pEDx) 
was previously referred to as pEDl when subcloned in pBR328 
[24]. Used as a probe on a Northern blot of polyA + mRNA of 
G or D expressing paramecia (Fig. 1) pEDx specifically hybrid- 
izes with the intense band and not with the weak band in polyA* 
mRNA from D expressing cells (see text and Fig. 1 B, lane 2). 

Genomic library construction. Total Paramecium DNA was 
partially digested with EcoRI and DNA fragments in the 1 5- to 
25 -kb range were size- fractionated on a low melting agarose gel 
and purified by the agarase method prior to ligation with X 
EMBL3 vector arms. Thirty thousand plaques were screened 
with pEDX and 82 positive plaques were selected at random 
without any size disprimination and purified by another round 
of hybridization. 



BOURGAIN-GUGLIELMETTI & CARON-D SURFACE PROTEIN GENES OF P. PRIMAURELIA 



305 



G D G D 




Fig. 1. mRNA characterization of G and D surface antigens. A. 
Methyl mercury RNA gel analysis of polyadenylated RNA extracted 
from G or D expressing cells. The abundant high molecular weight 
surface antigen mRNAs are visible by ethidium bromide staining and 
are indicated by arrows. Based on the size of the two ribosomal RNAs, 
the size of the G mRNA is estimated to be 8000 bases, those of the two 
mRNAs specifically expressed in serotype D 8000 bases and 7500 bases. 
B. The same gel blotted and probed with pEDx showing the specificity 
of this probe for the D mRNA. 



Isolation of recombinant clones and DNA sequencing. DNA 
was purified from recombinant phages according to Sambrook 
[35]. The restriction maps of recombinant phages was deter- 
mined by the cos mapping technique [34], 

Restriction fragments to be sequenced were subcloned into 
pUCi9. The Promega Erase a Base kit was used to create a 
nested set of deletions with exonuclease III (Promega, Madison, 
\VI, USA). The resulting plasm ids were transformed into E. coli 
strain MR32 to produce DNA for standard double stranded 
sequencing. Sequencing reactions were performed using the se- 
quenase DNA sequencing kit version 2.0 (USB, Cleveland, OH, 
USA). In general, only one strand was sequenced; but, each time 
there was an ambiguity in the sequence determination, both 
strands were sequenced. For the contiguous sequences produced 
after exonuclease III deletions, both strands of the overlapping 
segments were sequenced. 

In region A, the EcoRI restriction fragments of 1 .2, 1 .6 (EDx), 
1.6, 4.3, 2.1 and 3.1 kb from phage XD2 bearing the 156Da 
gene (Fig. 2) were entirely sequenced. For the 156D/3 gene, the 
Hindlll restriction fragments of 3.8, 1.3, 0.8, 0.2 and 5.6 kb 
(Fig. 2) and the 1 .4-kb EcoRI restriction fragment of XD24 (Fig. 
2) were partially sequenced from their extremities. In the B 
region, the extremities of XD8 1 EcoRI restriction fragments of 
5.7, 0.9, 1.6 and 8.9 kb were also sequenced. Each of the HincII- 
EcoRI fragments of phages XD8 1, XD19, XD57, XD22 and XD55 
were subcloned and entirely sequenced, but only the extremities 
of the corresponding fragment from XD15 was sequenced. 

DNA and protein sequences were analyzed using the Uni- 
versity of Wisconsin GCG sequence analysis software package 
[11] and DNA Strider[19]. 

DNA sequence accession numbers. The DNA sequences ob- 
tained were submitted to both the EMBL Nucleotide Sequence 
Database and GenBank. Unless otherwise noted, database ac- 
cession numbers are listed with the EMBL accession number 
first, followed by the GenBank Accession number in parenthesis. 

PCR amplification. The 25 m! reaction mixtures containing 



10-100 ng of genomic DNA or 0.02-20 ng of X recombinant 
DNA for control were amplified in a Perkin Elmer Cetus ap- 
paratus for 32 cycles (92° C, 1 min.; 65° C, 1 min. 15 s; 72° C, 
1 min. 30 s) and with a final extension time of 10 min. at 70° 
C The amplified products were directly used for electrophoresis 
or after purification with Qiaquick Spin kit (250) (Qiagen, Chats- 
worth, CA, USA) to eliminate oligonucleotides of less than 30 
bases. 

RESULTS 

Cloning of a probe of the D antigen subfamily. Figure 1A 
represents an ethidium bromide stained RNA gel of G or D 
expressing cells. In both cases, a major high molecular weight 
band corresponding to the abundant mRNA of the expressed 
surface antigen is detectable [24]. The mRNA from G expressing 
paramecia consists of a single band, which has been shown to 
contain only one mRNA species [23, 24], whereas the mRNA 
from D expressing cells consists of two bands, one intense band, 
which migrates slightly faster than the corresponding G band; 
and a weaker band, which has the same mobility as the G band. 
To obtain probes of the G and D surface antigens, we took 
advantage of the fact that the mRNA are polyadenylated and 
of high molecular weight [24, 31]: mRNA were purified on an 
oligo-dT cellulose column and size- fractionated on a sucrose 
gradient [24]. Radioactive cDNA were prepared from the frac- 
tions enriched in surface antigen mRNA with an oligo-dT prim- 
er and used as probes in a differential screening of an EcoRI library. 
Plaques positive with the D probe and negative with the G probe 
were selected, subcloned, and their phage DNA purified and 
analyzed by restriction mapping. An EcoRI fragment of 1.6 kb, 
pEDx (Fig. 2), common to all these recombinants, was sub- 
cloned in pUC18 (the same fragment cloned in pBR328 was 
already mentioned in previous publications as pEDl: [23, 24]) 
and used to probe a Northern blot of polyA^ RNA of G or D 
expressing paramecia (Fig. 1 B). It hybridizes specifically to the 
intense band of D expressing cells and not to the weak band. A 
G probe cloned in the same way hybridizes only to the unique 
G specific band [24]. This indicates that the weak band from D 
expressing cells that migrates with the same electrophoretic mo- 
bility as the G band differs in sequence from the 3' region of 
mRNA of the intense band and from the G mRNA. No attempt 
has been made to characterize the molecular species contained 
in this weak band since, if it is the mRNA of a coexpressed 
surface antigen, it is likely that it does not belong to the D 
subfamily. 

Cloning and characterization of the D antigen subfami- 
ly. Since surface antigen mRNA molecules are long molecules, 
we needed recombinants with large inserts to cover the whole 
gene(s). An EcoRI library was constructed from a partial diges- 
tion of Paramecium DNA after size selection of fragments in 
the 15- to 25-kb range and insertion in the X EMBL3 vector. 
Using pEDX as probe, 82 positive plaques were selected and 
subcloned, and 30 randomly chosen were analyzed by restriction 
mapping. All these phages can be associated to two genomic 
regions A and B (Fig. 2): 24 phages originate from region A and 
six from region B. Region A (top of Fig. 2) is more than 30-kb 
long and is entirely covered by four phages (out of the 24): XD1, 
XD2, XD3 and XD24. The first three contain an EcoRI fragment 
SI (indicated by a thick line in Fig. 2) which, by restriction 
mapping and DNA sequencing, is identical to pEDX. Phage 
XD24 contains a Hindlll-EcoRI restriction fragment called S2, 
which is also represented by a thick band in Fig. 2. S2 is the 
smallest fragment we could find that strongly hybridizes with 
pEDX (Fig. 2). The absence of EcoRI sites in this S2 fragment 
indicates that S2 is similar but not identical to pEDX. SI and 
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Fig. 2. Restriction mapping of the regions containing the D genes. The two maps of genomic regions A (top) and B (bottom) are shown. 
Restriction sites are: Ac, AccI; Bg, Bglll; E, EcoRI; He, Hindi; H, Hindlll; S, Seal; X, Xbal. The inserts of phages are shown below the maps. 
XDl, XD2, XD3 and XD24 cover the A region; XD15, XD81, XD19, XD57, XD22 and XD15, the B region. Three fragments, SI, S2 and S3, shown 
by thick lines at the 3' end of genes Da, D0 and D7, respectively, hybridized with probe pEDx. A rough estimate of the extent of similarity 
between the three genes is represented by the dotted thick lines. The black thick arrows indicate the sequenced Da gene and the two partially 
sequenced Dj8 and Dy genes. The two ends of each black arrow are located at the initiator codon ATG and at the stop codon of the coding 
regions. All sequenced regions are indicated by thick hatched lines on the maps of the phages from which the corresponding restriction fragments 
used for sequencing have been subcloned. The sizes of EcoRI or Hindlll restriction fragments are given under each fragment. Probes used in this 
work: pEDX, BHc, ED7.8 are indicated by thick lines under the corresponding fragment except for probe ED7.8, which corresponds to the 7.8- 
kb EcoRI fragment of phage XD55. The right part of the B region map is variable in size due to alternative DNA rearrangements. 



S2 are contained in XDl showing their proximity on the genome. 
The large size of region A (> 30 kb) and the presence of two 
similar sequences strongly suggest the presence of two genes in 
region A and, indeed, we shall show (see further in the text) by 
DNA sequencing that two similar D genes, Da gene on the right 
and D£ gene on the left, are present in inverted orientations, as 
shown in Fig. 2 by the two large arrows. 



The restriction maps of the six phages related to region B are 
displayed at the bottom of Fig. 2. An EcoRI fragment of 1.6 
kb, called S3, is present in all phages and has a sequence similar 
but not identical to pEDX since, for instance, pEDX does not 
contain a Seal site (S site in Fig. 2). All these phages contain an 
Hindi site designated by He*. At the left of this site, the re- 
striction maps of these phages correspond to a unique map. On 
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the right of the He* site, the maps differ (see for instance in 
Fig. 2 the distances between the He* site and the next EcoRI 
site on the right). The identity of the maps aMhe left of site He* 
suggests they originate from the same genomic region; an un- 
likely alternative explanation would be the existence of a region 
duplicated several times and maintained identical. The common 
genomic origin of the six phages is further supported by EcoRI 
restriction mapping of the region around the He* site by Southern 
blot (Fig. 3, 1 1), which reveals the presence of a 7.5- to 9-kb 
smeared band and of a 12-kb band corresponding to region B. 
Each of the five phages XD81, XD19, XD57, XD22 and XD55 
has an EcoRI fragment containing the He* site whose size is in 
the 7.8- to 8. 9-kb range (8.9, 8.5, 8.3, 8.2 and 7.8 kb, respec- 
tively) (Fig. 2), whereas XD1 5 has a 12-kb EcoRI fragment con- 
taining the He* site. This shows that the B region is heteroge- 
neous and complex and that the six phages provide a good albeit 
incomplete representation of it. For two phages, XD19 and XD22, 
the extreme right part of their maps is again identical (a common 
3.6-kb EcoRI fragment). Therefore, the right part of the maps 
might reflect variable DNA rearrangements during macronu- 
clear DNA differentiation. 

The next step was to determine whether or not the regions of 
similarity represented by SI, S2 and S3 were limited to these 
sequences. For this purpose, we used the series of EcoRI frag- 
ments of phage XD2 to probe different DNA digests of the other 
phages. The results (not shown) can be interpreted in the fol- 
lowing way: from each of the SI, S2 and S3 fragments, an 8- to 
9-kb region of similarity can be determined which, as shown 
later, covers three genes and contains SI, S2 and S3 at one end 
(dotted thick lines in Fig. 2). Outside of these regions, the sim- 
ilarity drops. For instance, the two EcoRI fragments (1.2- and 
3.1-kb long) (Fig. 2) that frame gene Da do not hybridize to the 
corresponding parts of genes Dp and D7. They are unique se- 
quences in the genome. Since pEDX (which is identical to SI) 
contains the 3' end of the gene because of the cloning process, 
the distal positions of SI, S2 and S3 with respect to the three 
regions of similarity must represent the 3' ends of the genes. 

How many genes in the D subfamily? Three putative D genes 
whose sizes are compatible with the size of a surface antigen 
gene and which contain either pEDX or a sequence similar to 
it have been obtained by cloning. Are they the only members 
of the family? A Southern blot of total DNA digested with EcoRI 
has been hybridized with pEDX (Fig. 3). Only two bands (13 
kb and 1 .6 kb) are present, the intensity of the 1 .6-kb band being 
greater than that of the 13-kb band. The 1.6-kb band corre- 
sponds to genes Da and D7 whereas DP gives a 13-kb band. 
Also, the same blot hybridized with the 7.8-kb EcoRI fragment 
ED7.8 of phage XD55 (Fig. 2) gives a group of bands 4.3-, 2.1- 
and 1 .6-kb long that correspond to gene Da, the 1 3-kb band to 
gene Dp and the smeared bands that extend from 7.5 to 9 kb 
and around 1 2 kb to gene D7. Various probes used on Southern 
blots of total Paramecium DNA digested with different restric- 
tion enzymes always give a pattern of bands compatible with 
the restriction maps of the three genes described above (results 
not shown). 

Expression of the D genes. To study the expression of these 
genes, we needed probes specific for each of the three putative 
mRNA. pEDX (and also SI, which is identical to pEDx) has 
been entirely sequenced. Partial DNA sequencing of the cor- 
r esponding sequences (S2 and S3) of the two other genes was 
carried out and the sequences compared with that of pEDX 
(data not shown). The similarities between these sequences are 
extremely high (92% for SI and S3 and 97% for SI and S2) in 
the part of the sequences corresponding to the 3' end of the 
coding sequence, which should be present on a putative mRNA 
"it is expressed. However, three short regions of 23- to 25-bp 
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Fig. 3. Analysis of genomic DNA with D probes. Southern blots of 
EcoRI -cut Paramecium primaurelia strain 156 genomic DNA hybrid- 
ized with probe pEDx (A) or probe ED7.8 (B). The various fragments 
indicated by thick arrows have sizes compatible with the three genes 
whose maps are shown in Fig. 2. The attribution of each fragment to 
the corresponding gene is shown by thin arrows. 



have been chosen for their relatively high number of mismatches 
between one gene and the other two: Osrl from SI (gene Da) 
is 23-bp long and has a two-mismatch difference with the other 
two sequences, S2 and S3 (Fig. 4), which are identical in this 
region. The same was obtained for S2 and S3 (genes DP and 
D7) with Osr2 and Osr3 (Fig. 4). The corresponding oligo- 
nucleotides were synthesized and used as probes on Northern 
blots of total RNA of D expressing cells. Care was taken to use 
stringency conditions suitable for exclusive hybridization of each 
oligonucleotide with the corresponding identical sequence. The 
results shown in Fig. 5 indicate without any ambiguity that only 
the mRNA of gene Da is present in the D serotype. 

Sequences of gene Da and of the corresponding putative pro- 
tein. As shown in previous articles, the coding sequence of 
these surface antigen genes can be determined accurately from 
the DNA sequence by the sudden drop in AT percentage when 
entering the coding sequence (and rise when leaving it) [7, 27], 
It can also be determined by the bias in favor of A or T at the 
third position of each codon. This gives rise to a 3-bp periodicity 
of AT percentage in the coding sequences. Such a periodicity is 
completely absent from the flanking noncoding sequences. A 
14-kb EcoRI fragment from phage XD2 covering entirely gene 
Da has been entirely sequenced (see the thick hatched line on 
the map of phage XD2 in Fig. 2). Various regions along genes 
DP and D7 have been sequenced too and compared with the 
sequence of gene Da: the boundaries of the coding sequences 
(start codon ATG and stop codon TGA) have been determined 
without ambiguity using the tests mentioned above for all three 
genes. In between, the similarities of the coding sequences with 
Da are extremely high (95% for gene DP and 91% for gene D-> 
on the average) but outside these coding regions the percentage 
of similarity drops except for a few short stretches of sequences 
in the 5' and 3' noncoding regions (see below). 

The open reading frame of gene Da is 7632-bp long and does 
not contain introns. This size is compatible with the experi- 
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Fig. 4. Oligonucleotides specific of each D gene. The sequences of 
oligonucleotides Osr 1 , Osr2 and Osr3 are shown and aligned with the 
corresponding sequences of the other two genes. Osrl in region SI is 
23 bases long and the corresponding sequences in S2 and S3 present 
two mismatches (9%) noted in little bold prints. The oligonucleotides 
Osr2 (23 bases) and Osr3 (25 bases) in regions S2 and S3 show three 
base ( 1 3%) and four base ( 1 6%) differences, respectively, compared with 
the other two sequences. (Conditions for the specific hybridization of 
these oligonucleotides to the exact complementary sequence are given 
in the text.) 



mental size of the mRNA determined from the Northern blot 
of Fig. 1 (about 7500 bases). The coding sequence of the 156G 
gene of P. primaurelia is 8145-bp long. The 500-bp difference 
correlates nicely with the molecular weight difference of the 
mRNA. 

From the nucleotide sequence of gene 1 5 6 Da, the amino acid 
sequence of the putative protein has been determined and is 
displayed in Fig. 6. The protein is 2543 amino acids long and 
has a calculated molecular mass of 267 kDa, which is close to 
the experimental value, suggesting that the extent of the mat- 
uration process of this protein is small. The amino acid com- 
position was deduced from the amino acid sequence using the 
special deviated genetic code of Paramecium [7]. This protein 
is rich in cysteine (10.5%), threonine (13.5%) and serine (9. 1%), 
a property shared with all other sequenced surface antigens [25- 
27, 36]. Because the amino acid composition of the 156Da 
protein has not yet been determined experimentally, we com- 
pared the deduced amino acid composition with the experi- 
mental amino acid compositions of two alleles of the 156Da 
gene: 90D and 178D [16]. This is possible because the amino 
acid sequences of two alleles of a surface antigen gene are ex- 
tremely similar (for instance the G genes from strains 1 56 and 
168 have a 93% similarity, see [26]) and because the amino acid 
compositions of all sequenced surface antigens are extremely 
homogeneous [27]. A x 2 test was performed to compare the 
amino acid composition of the putative 1 5 6 Da protein and an 
average of the two experimental amino acid compositions of 
90D and 178D. The contribution to the x 2 value of each amino 
acid is low (between 0.2 and 1.9) except for serine and glycine. 
The same discrepancy for serine and glycine was previously 
observed in the 156G protein [27]. 
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Fig. 5. Determination of the expressed D gene. Northern blots of 
total RNA of G- or D-expressing paramecia. The positions of the G or 
D mRNAs are indicated by arrows. On the left, G total RNA is hy- 
bridized with oligonucleotide OSG specific for G mRNA (OSG is a 
sequence at the 3' end of the coding sequence of the G gene: 5'-AC- 
GATGACCCTTAATCGTAGACGATGTTTATTATTA-3 ) and used 
as a positive control for hybridization. On the right, the same experiment 
is done with D total RNA and the three oligonucleotides shown in Fig. 
4: i.e. Osrl in A, Osr2 in B and Osr3 in C. For each oligonucleotide, 
the membranes are washed at three temperatures (Tl = 45° C, T2 = 
50° C and T3 = 55° C) and exposed for the same length of time (22 h). 



The amino acid sequence of the 156Da protein displays the 
usual pseudoperiodicity typical of Paramecium surface antigens 
(156G: [27] and 168G: [26] of P. primaurelia; 51A and 51C: 
[25], 5 IB: [36], and 51 Da: see accompanying paper, of P. te- 
traurelia). The position of the cysteine residues are well con- 
served and are aligned vertically in Fig. 6 to enhance this effect. 
However, in 156Da, the pseudoperiodicity appears less regular 
than in 156G, 168G and 51A and the vertical alignment in Fig. 
6 is less clear [25-27]; also, the distance between two cysteine 
residues is sometimes extremely large (line 8) and some half 
periods contain three, five or seven cysteine residues (lines 8, 
27, 28 and 29) instead of four in the three above mentioned 
proteins. As in 51C protein [25], a remarkable feature is the 
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Fig. 6. Pseudoperiodic structure of the 1 56Da protein. The complete amino acid sequence is represented in such a way that each line corresponds 
to one period, as in Prat et al. [27]. Each period contains eight cysteine residues, except for the two ends and five incomplete periods (lines 8, 
17. 19, 28 and 34). The database accession number of the 1 56Da gene is a562F34s (X96616). 



absence of central repeats, which were observed in 1 56G, 168G, 
5 1 A and 5 IB proteins [25-27, 36], This again strengthens the 
point that apparently two kinds of surface antigen structures 
exist: with or without central repeats. In Fig. 1, matrices of 
identity between the 156D protein and various other surface 
antigen sequences are displayed: in a, the comparison between 
the \56Da and itself shows the absence of central repeats but 
the presence of a repeated motif at both ends of the protein. 
The latter appears to be a distinctive feature of the 156Da 
protein; in b, two D surface protein sequences, 156Da of P. 
primaurelia and 51 Da of P. tetraurelia (see accompanying pa- 
per) are compared: the continuity of the diagonal illustrates the 
similarity (82%) of the two sequences. In C and D, the com- 
parison with two surface antigen sequences with (156G: [27]) 
or without (51C: [25]) central repeats shows that in both cases, 
the sequences are similar except for a central part representing 
roughly one third of the sequence. This study and the previously 
published ones show that all these surface protein sequences are 
most similar at both ends and can display large sequence vari- 
ations in the central part [25, 26, 36]. 

Comparison of the 5' and 3' noncoding sequences. In Fig. 8, 
3' and 5' noncoding sequences of various surface antigen genes 
upstream of the initiator ATG and downstream of the TGA 
stop codon are aligned. As mentioned by previous authors three 
consensus sequences, at -10 and -60 upstream of the ATG 
codon, and one 30-bp downstream of the TGA stop codon, 
present in all surface antigen genes, are also present in the three 
D genes studied here [32]. No motif common to .a subset of 
these sequences has been found. Also, these motifs are absent 
in other Paramecium genes already sequenced [10, 18]. This 



suggests that these common motifs could be binding sites for 
transcription factor(s) necessary for general (and not specific) 
surface antigen gene expression. 

A region of variable DNA rearrangement is present at the 5' 
end of gene D-y. As previously mentioned (Fig. 2), a region of 
variable size is present at the 5' end of gene D7, which is most 
likely to arise from variable DNA rearrangements that occur 
during macronuclear development. Six versions are represented 
by the six recombinant phages AD 1 5, XD8 1 , \D 1 9, XD57, XD22 
and \D55 and the variable region is contained in a Hc*-EcoRI 
fragment (Fig. 2). To determine the nature of these rearrange- 
ments, this Hc*-EcoRI fragment has been sequenced for each 
of the six recombinant phages. In Fig. 9, the sequences have 
been aligned vertically when they are rigorously identical. At 
first glance all these sequences appear to share large regions of 
identity, but some subsequences present in recombinants are 
absent in others as if they had been eliminated. The micronu- 
clear DNA corresponding to these macronuclear regions has not 
yet been cloned. Therefore, the only possible investigation we 
have done is to order the subsequences of these six macronuclear 
versions as they are in the macronucleus. For instance, the seg- 
ment SGI (Fig. 9) of 200 bp, which is present only in XD57, 
could be located at various positions in macronuclear DNA and 
not necessarily at the position shown in Fig. 9. However, this 
position appears to be correct since PGR products amplified 
from genomic DNA using 01 and 02 as primers (Fig. 9) do not 
hybridize with 04, which is contained in SGI, suggesting that 
04 is effectively at the right of Ol and 02. Moreover, PCR 
amplification of macronuclear DNA with oligos 03 and 06 (Fig. 
9 for their location) reveals the presence of smeared bands that 
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156D/156G C 51C/156D D 

Fig. 7. Dot matrix comparison of the 1 56Da deduced amino-acid 
sequence with other surface antigen sequences. A. i 56D<* versus 1 56Da. 
B. 1 56Da versus 5 lDa.C. 156Da versus 156G. D. 156Da versus 5 1C. 
DNA and protein sequences were analyzed using the program "DNA 
Strider" [19]. The accession number of 156Da gene is a562F34s 
(X96616). 



hybridize with SGI. This shows that SGI is between 03 and 
06 but also that in the macronucleus the rearrangements are 
much more complex than what is represented by those six phages. 
This high degree of complexity is also revealed at various other 
positions in this region; for instance PCR amplification using 
03 and 04 as primers also gives multiple discrete bands. How- 
ever, no PCR product was obtained with 04 and OS, showing 
that SG2 is at the right of SGI (Fig. 9). Indeed, PCR amplifi- 
cation with 05 and 06 gives a unique band of 340 bp as in 
phage XD22. 

The fact that the sequences in this region, when common to 
multiple phages, are identical, strongly suggests that they arise 
from the same micronuclear sequence by alternative elimination 
of internal sequences called IES (for a review, see [33]). Also, 
in favor of a variable elimination of different IES is the presence 
of a 5'-TA-3' dinucleotide at each interruption shown by a dot- 
ted line in Fig. 9 (results not shown). The sequences from each 
recombinant phage bordering the SG3 segment of 130 bp pres- 
ent in XD15, XD19 and XD55 but absent in XD81 are shown in 
Fig. 10A. The comparison of XD81, where the 130-bp sequence 
is absent, and of the other three (XD15, XD19 and XD55) where 
it is present, shows the presence of a 5'-TA-3' repeat at both 
ends of the eliminated sequences. The fact that 5'-TA-3' is re- 
peated twice in XD55 and only once in XD1 5 and XD19 can be 
explained by the model shown in Fig. 10B where two left IES 
boundaries are alternatively used. A complete description of the 
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156dtt TTTTATCTCT CAATAGAGGG TTTTTTAATA AATATAAAAC TCTTAAAATT 

156dp TTATCTCTTA ATAAGGGTGC TTTAGTAATA TTAAA 

156d7 TATCTCTTAA TAGAGGTGTT TTAGTAGATG TTAAATTCGT AAAATTAATA 

Slda ATTAATTTAA TCTTTAAATA AGATTGTTTC AAAAATAAAA TCTAAATCAT 

Fig. 8. Comparison of 5' and 3' noncoding sequences. Noncoding 
sequences in the 5' (A) and 3' (B) part of 156Da, Dft D7 and 51 Da 
genes were aligned using the GCG program. A. Position - 1 is imme- 
diately before the translation initiation codon. B. Position + 1 is the 
first stop codon. The conserved sequences are underlined. The accession 
numbers are: a562F34s (X96616)- \56Da upstream and downstream 
noncoding sequences and coding sequence; R436K87I (X96626) and 
o456b63N (X96627)-156D0 5' and 3' noncoding sequences, respec- 
tively; u518a81y (X96629) and F478U10W (X96628)- I56D7 5' and 
3' noncoding sequences, respectively. 



DNA rearrangement will have to await the cloning of the cor- 
responding micronuclear copy. 

Caryonidal variation of the variable DNA rearrangement. In 
many cases, variable DNA rearrangements in Paramecium dis- 
play a caryonidal variation [6, 1 7]. This is also true for this 
region: macronuclear DNA from different caryonidal clones cut 
with EcoRI are hybridized on Southern blots (Fig. 1 1) with the 
BHc probe shown in Fig. 2 and covering the 5' end of gene Dy 
at the border of the variable DNA rearrangement region. A 12- 
kb band represented by XD1 5 and a smeared 9- to 7.5-kb band 
represented by the other five (XD81, XD19, XD57, XD22 and 
XD55) are present in all caryonides, but with different intensities. 
More precisely, a low intensity of the smeared band correlates 
with a higher intensity of the I2-kb band. This again supports 
the idea that these macronuclear versions arise from the same 
micronuclear locus and that the macronuclear versions giving 
rise to the 12-kb EcoRI band could be versions where the right 
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Pig. 9. Map of the variable region. Alignment of the sequences to the right of the He* site (Fig. 2) of the six representative phages XD15- 
3ll vession number B545F67o (X96630); XD81 -accession number p590L20n (X96635); XD 19- accession number Q566R53h (X96631); XD57- 
ja-ession number E587ql3S (X96634); AD22-accession number F572A59u (X96632); XD5 5 -accession number G579V15t (X96633).' Vertical 
bars indicate positions where sequence identity is interrupted. Unique sequences SGI and SG2 were ordered by PCR using oligonucleotides Ol, 
o:. 03. 04, 05 and 06 (boxed areas).The 5 -3' orientation of the primers is indicated by an arrow. The sequences of the EcoRI extremity of 
\DS1. XD19, AD57, XD22 are identical, but different from the XD15 and XD55 sequences (not shown), which also differ from each other. 



EcoRI site present in the others is absent due to sequence elim- 
ination. No attempt has been made to determine the overall 
extension of this region of alternative DNA rearrangements. 

DISCUSSION 

A subfamily of three surface antigen genes of the D type, Da, 
DJ and D7, has been found in strain 156 of Paramecium pri- 
ma iirelia. Only one (Da gene) is expressed in the D serotype 
and it corresponds to the major high molecular weight mRNA 
species. A minor species in the same molecular weight range, 
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Fig. 10. Sequences of AD 15, AD 19, XD55 and AD81 in the variable 
region. The sequence of SG3 junctions is shown in A. The direct repeats 
5-TATA-3' on each side of SG3 in XD55 are framed and a 5-bp pal- 
indromic sequence is indicated by the two arrows. The dinucleotide 5'- 
TA-3' is missing in XD15 and XD19 so that the direct repeat is only 5'- 
TA-3'. In XD8 1 the SG3 130-bp segment is absent and one of the direct 
repeats 5'-TATA-3' is retained. Continuous lines symbolize identical 
sequences. A possible interpretation for the observed polymorphism at 
the left SG3 junction is shown in B. Two elimination processes from a 
noncloned version of this region can explain the presence either of a 5'- 
TA-3' in AD15, XDI9 or of 5'-TATA-3' in AD55. 



but with a slightly lower electrophoretic mobility, has been de- 
tected. It could be another surface antigen mRNA coexpressed 
with the D surface antigen, but in this case the failure to cross- 
hybridize with the 3' part of its sequence, even at low stringency, 
with G or D surface antigens probes suggests that it probably 
belongs to another family. However, if this minor species was 
the mRNA of a surface antigen, this would not be surprising. 
Indeed, similar cases of co-transcription of surface protein RNA 
have been described in the literature; for instance, in P. tetrau- 
relia, the mRNA of the C surface protein is present in the cy- 
toplasm of cells expressing the H serotype, but the C protein is 
not detected at the cell surface [14]. Also, P. Margolin [20] has 
shown that in the strain 172 of P. tetraurelia, surface antigen 
M is often weakly coexpressed with D. That the minor species 
could be the mRNA of a surface antigen is reinforced by protein 
analysis [4] of D expressing cells. In the high molecular weight 
range, a major band corresponding to the major mRNA can be 
seen along with a minor band migrating slightly more slowly 
that could correspond, based on its relative, intensity, to the 
minor mRNA species. Both are membrane proteins since they 
immunologically react with an anti-CRD monoclonal antibody. 

The sequence of the 156Da protein is given in Fig. 6. It shows 
the same structural features as those of previously published 
surface antigen sequences, namely a highly periodic structure 
revealed by the regular position of eight cysteine residues per 
period [27]. Two types of surface antigen gene structures have 
been described in the literature: one that contains central repeats 
like 156G and 168G in P. primaurelia [26, 27] or 51 A [25] and 
5 IB [36] in P. tetraurelia, and another that does not contain 
these repeats, making the amino acid sequence slightly shorter, 
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Fig. 1 1 . Caryonidal analysis of the variable region. A Southern blot 
of EcoRI-restricted genomic DNA from different caryonidal cultures 
(1-10) of P. primaurelia strain 156, was hybridized with probe BHc, 
which is common to all six cloned versions in the 5' region of the D7 
gene (Fig. 2). The arrows indicate the position of the 12-kb band and 
the 7.5- to 9-kb smear. Samples 1 to 10 are DNA. 
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such as 51C [25]. In this respect, the 156D« protein belongs to 
the second category, but the resemblance stops there since the 
matrix of identity between 51C and 156D displayed in Fig. 7D 
shows only a few randomly scattered stretches of identity. The 
sequences of the D surface proteins of P. phmaurelia and P. 
tetraurelia show remarkable similarity (Fig. 7B), a feature that 
has been already noticed between surface antigens of the two 
varieties: for instance, A from P. tetraurelia and G from P. 
phmaurelia. Moreover, two regions at the NH 2 and at the COOH 
termini occupying a rather symetrical position in the two D 
sequences clearly appear as being made of a repeated motif as 
shown by the line segments equidistant and parallel to the first 
diagonal (see the identity matrices of Fig. 7 A, B). Besides, these 
two motifs appear to be similar as shown by the enhancement 
of similarity segment numbers in the upper right corner of the 
same figure. 

We have shown that only three genes belong to the D sub- 
family (Fig. 2, 3). The other two nonexpressed D genes, Dfi and 
Dy, are extremely similar to the expressed Da gene at least 
along the parts that have been sequenced. Also along those same 
sequences, their open reading frames are not interrupted by 
STOP codons, suggesting that they are not pseudogenes. These 
three macronuclear genes must arise from three different mi- 
cronuclear genes and not from an alternative processing of a 
unique micronuclear gene since they differ by numerous point 
mutations all along their sequences. Two interpretations of the 
existence of these three very similar surface antigen genes can 
be mentioned at this stage. Firstly, these three genes could have 
been obtained by recent duplications processes and indeed two 
of them, Df$ and D7, are physically closed to each other. In this 
respect, the three genes would represent isoforms of the D sub- 
family. This would be a new and remarkable feature of the 
surface antigen gene properties since for each of the others char- 
acterized serotypes, a unique gene have been cloned, in perfect 
agreement with genetic data showing these genes to be in unique 
copy in the genome (for a review, see [29]). Secondly, Da would 
be the unique gene of the D serotype and the two D0 and D7 
genes could be assigned to two others serotypes closely related 
to D. They could be the analogs in P. phmaurelia of surface 
antigens J and M, which have been characterized in P. tetraurelia 
as immunologically related to D [28]. It is worth mentioning 
here that two surface antigens can have very homologous se- 
quences without belonging to the same serotype: a good example 
is given by the two alleles 1 56 and 168 of the G surface antigen; 
the determinants of the serotypes have been shown to lie within 
the central repeats, which are the only part of the two sequences 
that differs significantly [8]. These central repeats represent a 
short amino acids motif (74 amino acids for 156 and 73 for 
1 68) compared to the length of the whole sequence (2715 amino 
acids for 156 and 2704 for 168). Apart from this example, we 
have a complete ignorance of what in a surface antigen amino 
acid sequence determines the serotype. Therefore the genetic 
data mentionned above simply shows that the D serotype cor- 
responds to the expression of the Da gene alone. 

The noncoding sequences of genes Da, D# and D?, either at 
the 5' or at the 3' end, are very similar (Fig. 8). In cases where 
the expression of surface antigens has been tested, primarily by 
microinjection of a plasmid containing a defined surface antigen 
gene, not only the expression of this surface antigen but also the 
regulation of its expression as a function of temperature has 
been found to be controlled by sequences within the gene itself 
or within a few hundred base pairs at both 5' and 3' ends [15, 
21, 22]. The fact that the coding sequences of genes Da, D0 and 
E>7 and the 5' and 3' noncoding sequences are very similar, but 
that only one of these three genes, is expressed indicates that 
some subtle changes in their sequences may control expression. 



Microinjection of the two nonexpressed genes. DQ and Dy into 
the macronucleus will be of great interest. 

A region of variable sequence has been identified upstream 
of the Dy gene and the sequences of six versions of this region 
have been partially determined. Their structures strongly suggest 
that they are derived from the same micronuclear copy by al- 
ternative DNA sequence elimination. Indeed, the comparison 
of the sequences shows the systematic presence of a 5'-TA-3' at 
each point where different versions start diverging in their se- 
quence. Elimination of IES has already been reported by dif- 
ferent authors in both P. phmaurelia and P. tetraurelia, and of 
the two 5'-TA-3' dinucleotides that border the IES only one is 
conserved after elimination [I, 30, 37; 39]. Therefore, alterna- 
tive IES elimination could account for the results obtained. 
However, we have cloned only six versions that have been par- 
tially sequenced and Southern blots of this region show a large 
heterogeneity of variable versions: this suggests a much larger 
number of possible versions. Also, the extent of this variable 
macronuclear region has not yet been determined. 

In the absence of the cloned micronuclear copy it appears 
difficult to build up a model of DNA processing that would 
justify the maintenance of one or two 5'-TA-3' in some mac- 
ronuclear versions as shown in Fig. 10A; but at least three pos- 
sible explanations can be considered: an alternative elimination 
of a family of IES either consecutive in the micronuclear genome 
or imbricated as in Fig. 10B, a microheterogeneity in the choice 
of boundaries, and finally, a microheterogeneity in the junction 
created by IES excision that would originate from the mecha- 
nism of excision itself (a variable DNA repair before religation, 
for instance) and would be reminiscent of some transposon 
excision [40]. The functional role of this region (if any) has not 
yet been determined, but it is tempting to consider that it might 
be involved in the regulation of Dy expression or in the yet 
unknown mechanism generating variability of these surface an- 
tigens. 
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